Chromosome-level genome and the identification of sex chromosomes in Uloborus diversus

Abstract The orb web is a remarkable example of animal architecture that is observed in families of spiders that diverged over 200 million years ago. While several genomes exist for araneid orb-weavers, none exist for other orb-weaving families, hampering efforts to investigate the genetic basis of this complex behavior. Here we present a chromosome-level genome assembly for the cribellate orb-weaving spider Uloborus diversus. The assembly reinforces evidence of an ancient arachnid genome duplication and identifies complete open reading frames for every class of spidroin gene, which encode the proteins that are the key structural components of spider silks. We identified the 2 X chromosomes for U. diversus and identify candidate sex-determining loci. This chromosome-level assembly will be a valuable resource for evolutionary research into the origins of orb-weaving, spidroin evolution, chromosomal rearrangement, and chromosomal sex determination in spiders.

accuracy and clarity of the manuscript. We also thank the Reviewers for their positive comments on our study.
We address the specific points raised by the Reviewers below:

Reviewer #1
This paper presents the first uloborid spider genome--and it is a chromosome level assembly. Genomes of this family are important because the orb web is supposedly independently and convergently evolved in this group. Although my expertise is not in the technology and informatics of genome sequencing, it appears to be well done.
We thank the reviewer for their enthusiasm and insightful suggestions. We have made suggested changes, as outlined below.  Table S1 Number of Componenet Sequences--typo Thank you for catching these errors. They have been corrected in Figure 1 and Table  S1.
Text single exon We found a --typo Thank you for drawing our attention to this typo. The correct punctuation has been added on line 460.
can be ascribed by --can be inferred by?
Thank you for this suggestion. This introductory sentence has been rewritten to provide clearer language on line 478. In doing so, this choice of words has been removed.
an Araneid orb-weaver--araneid usually not capitalized We appreciate this correction. The change has been made to the text on line 508. In addition, we identified three additional occurrences of the same mistake, on lines 24 (found in the Abstract), 317 (found in the AcSp section), and 666 (found at the end of the Whole Genome Duplication section in Discussion) which has also been corrected. Thank you so much for advising us of the troublesome nature of this work. We have removed the reference and any associated results in our compiled tables.
Re methods, it would be of interest to know what HMW DNA fragment sizes were (expressed as kb, or mb), although Tape Stations are not very accurate. For people who collect spiders with the intent to yield HMW DNA, such data are important. Data are scarce, so any facts are significant.
Thank you for bringing our attention to this omission. A statement regarding the average integrated percent of mass accounted for by the major peak in the TapeStation results and the fragment length around which the peak is centered has been added on lines 730 and 731.
Any homologs of the Pyriform spidroin (PySp) in Acanthoscurria? Piriform silk attachment points are a synapomorphy of araneomorph or "true" spiders. Liphistiomorph and mygalomorph spiders do not (cannot?) make point attachments, and the inability to make point attachments either to substrate or silk-silk point attachments probably constrains/ed the evolution of web architectures in nonaraneomorph spiders. Therefore finding homologs to PySp spidroins in nonaraneomorph spiders is of great interest to explain araneomorph web architecture diversity.
Thank you, yes this is an excellent point. We initially held off on too much spidroin analysis since we were aware of another manuscript on U. diversus spidroins [1], and had agreed with the first author that we would avoid too much analysis in this area to avoid scooping them. However, since the paper was recently published, we have included some commentary on this on lines 450 to 455, and have cited their paper.
Likewise, tubuliform spidroin (TuSp) is probably a synapomorphy of entelegyne spiders, with derived female genitalia--a "flow-though" sperm management system. Eggsacs occur widely in non-entelegyne spiders, so it is a mystery why entelegynes have specialized spigots, glands, and spidroins for the same purpose. Indeed, the particular function of tubuliform silk is not clear. Any thoughts on this?
Thank you, this is an interesting point about entelegyne spiders. We can only speculate, but compared to mygalamorphs which primarily lay their egg sacs in their burrows that are well-protected from the elements, aranaemorphs produce egg sacs that are more exposed and either hung on webs, on terrestrial substrates, or carried by the spiders. Tubuliform silk may have evolved to provide enhanced protection for the eggs relative to mygalamorph egg sacs which rely on the burrow for added protection. This is highly speculative though, and we are unsure whether it is appropriate to include in the main text.
It is good to see attention paid to the mitochondrial genome, as many whole genome studies ignore it. In spiders, early work claimed that tRNA's appeared to be peculiar. Masta and Boore. 2004 Thank you for the suggestion. In the original text we did state the tRNAs we identified also lacked the 3′ aminoacyl acceptor stems (T-arms), but we have added a few sentences to clarify this odd observation on lines 262 to 268. Shrunken tRNAs occur in other animals as well, such as nematodes. However why this occurs is currently unknown.
Finally, any comments on evidence for or against the convergent evolution of the orb web? Homology between the pseudoflagelliform and flagelliform spidroins would be pertinent. The intro does raise expectations that some of the macro / larger evolutionary questions will be addressed in the paper, but many, see above, are only cursory or not too much. Perhaps include a sentence in intro acknowledging this, but saying that this paper intends to present the genome and address sex chromosomes, but other topics? For example the sections on some of the spidroins do not extensively discuss comparisons with other spider genomes.
Thank you, yes we were vague on this point, and added text to address this in the spidroin and discussion sections (Lines 332 to 335, and 634-649). As mentioned earlier, we were a little too circumvent on this point due to a desire not to overlap too much with Correa-Garhwal's work, but now that it's published, we feel free to discuss this at greater length.

Reviewer #2
In this study, the authors generated huge genome sequencing data and RNA-seq data and provided a genome assembly with rather complicated merging approach, of a spider with novel phylogenetic position. The genome undoubtedly added novel and important resources for deep understanding of spider evolution. However, there are still severe issues that need to be addressed. 1. There are huge sequencing data from different samples. However, I don't think that marge of different assemblies is good for a final qualified genome. Given high heterozygosity, that illumina data and ONT data from different individuals is quite difficult to use for assembling a clean genome. As shown in Table 2, assembly by Hify approach is not obviously inferior compared with the merged one, but obviously much better in avoiding redundancy. I strongly suggest that the author adopt the genome assembly of Hify data from one individual, instead of merging two sets of assemblies. Illumina and Nanopore assembly may be helpful in fully deciphering silk proteins.
Thank you, we apologize for the confusion on this point. The final assembly is ultimately based on the single-individual PacBio HIFI data. We used the assembly of ONT and Illumina data to patch gaps in the scaffolds produced from PacBio HIFI data. Overall, the gap closing process patched 2617 gaps, inserting 31.4Mbp of sequence, which amounts to only about 1.5% of sequence added to the HIFI assembly. The small amount of sequence added from the Illumina+ONT assembly improved the contiguity and likely resolved repetitive regions in places where the HIFI reads failed to span them. We expect that the additional sequence added to the assembly had no impact on our conclusions.
We have added language to clarify the process by which the SAMBA tool improves the PacBio IPA assembly, removing the misleading term "merge" and replacing it with the clarifying phrase "patch gaps", as well as the quantification of gaps patched, sequence length added, and percent of the genome for which this added sequence accounts on lines 182 to 186.
2. Proportion of repeats are somewhat affected by the quality of assembly. The high heterozygous genome assembly is complicated merged by diverse batch of data, so the real quality might be not as good as the author described. The quality of repeat is especially hard to evaluate. Hence the statements on genome size (Line 193-200) are not convictive.
Thank you, and again we apologize for the confusion. On lines 190 to 198 we described the statistics of the assembly. We agree, typically draft genome assemblies do not capture all sequence of the genome, for example highly repetitive subtelomeric and centromeric regions are typically missing. Although draft genomes have limitations, they have proven incredibly valuable as the basis for an understanding of the genetics and evolution of hundreds of plants, animals, and other species. We used the sequencing data that we had generated within our budget to the best extent possible to produce a contiguous de novo assembly whose quality exceeds most other published arachnid genomes. In addition, the assembled genome is consistent with that predicted by GenomeScope 3. About the assembly of RNA-seq data. The authors get huge amounts of data. However, it is not so helpful to obtain novel transcripts if the data is saturated. More importantly, assembly of short reads is even not so useful to obtain long transcripts.
We agree that de novo assembly of short-read Illumina RNA-seq data is an error-prone process, and the data were generated before isoSeq was affordable. However, when a high quality genome is available, as is the case for the present study, transcript assembly becomes a much easier problem. We used the Trinity software package [2] to assemble transcripts from RNA-seq data guided by the genome assembly. Trinity is quite capable of assembling full length transcripts with the help of a well-assembled genome.
4. As to whole genome duplication. The authors did not provided solid evidence supporting that WGD occurred in U. diversus genome. They only demonstrated two hox clusters therein. The synteny analysis was quite confusing which is not helpful in confirmation of WGD. They need to provide more solid genome-wide evidence, or otherwise totally downplay the statements.
We apologize for this, and have downplayed our statements. Our observations are consistent with prior work on WGD. We included synteny maps to show that while large syntenic portions are shared across the genome, considerable chromosomal rearrangements have occurred since the duplication event. In other systems that have diverged more recently, such as domestic plants [3,4] or humans and chimpanzees [5], duplications or fusion events can be observed based on large syntenic regions. However in spiders, this is not observed, and is similar to what is observed in reptiles and birds, where considerable rearrangements have occurred [6]. 5. The identification of the sex chromosome is still vague. The statements are not well organized. The statements and the results are so vague and not convictive. "While 8 of the 10 pseudochromsomes had a median read depth of 40 ± 2, pseudochromosomes 3 and 10 were outliers, with read depths of 36 and 33, respectively." The difference in sequencing depth is rather convictive. As I know the authors sequenced female and male samples. So why they didn't clearly compare the depth of the two sex chromosomes between them and make more evidence?
Thank you, we apologize for the poor communication and added additional sentences to clarify the analyses. As you thoughtfully suggested, we also included read depth coverage for the female sample, and added additional comparative synteny maps for the predicted sex chromosomes. As for the read depth disparity between male and female samples, it is consistent with what was observed in A. bruennichi. We provide 3 possible reasons for why we might observe this discrepancy. It is likely the read depth difference between males and females is smaller than expected given the several syntenic blocks shared between the sex chromosomes and autosomes. The Hox cluster is a good example. Since Ubx is on chromosome 5 and 10, the read depth disparity between males and females would be 75% for reads that map to this region (Males: haploid for 10, diploid for 5, Female: diploid for both 5 and 10).
However, we think the sum total of read depth and synteny analysis are all consistent with chromosomes 3 and 10 being the sex chromosomes.
Other: 1. The information of chromosome-level spider genome are not Incomplete. As I know, there is a black widow genome with chromosome-level. The authors need to added this one.
Thank you for bringing this discrepancy to our attention. We have added this reference and cited this genome and included the stats on our table summarizing the genome statistics of published genomes. We also included synteny comparisons for this genome in Figure 5 and Figure 6.
2. The authors need to release the sequences of the spidroins the identified and described.
Thank you for the suggestion. While we intended to release these sequences in the public databases as soon as possible following publication, we have included these sequences in a supplementary file (Spidroin_sequences.txt). written.
Thank you for your kind comments.
Minor comments: 1. In the Figure 1B, I noticed that it noted the estimated divergence times of the Araneae, I think there should be add the reference, or detail describe how to do.
Thank you for bringing this to our attention. We have added the appropriate reference to the Figure 1B Legend on lines 998 and 999.
2. There is something wrong with the table format, such as Table1, 2, 5 and Table 6 Thank you for pointing this out. Tables 2 and 4 have been removed from their in-line placement in the text and will be submitted as Excel files, due to their size. This remedies the issues with formatting for Tables 2 and 4. The spacing and alignments in the remaining tables have been adjusted to correct formatting in their current in-line placement within the text.
Thank you for pointing out this typographical error. The errant space has been removed on line 73. 4. Line 147 to lines 148: Line breaks error.
Thank you for catching this formatting error. The inappropriate carriage return has been removed on line 159. Thank you for catching this. We have made the correction to the punctuation on line 526, addressing the placement of the reference 6. Line 511-512: In the genome of spider Uloborus diversus, which chromosome the genes of "sex lethal (sxl)" and "doublesex (dsx)" located at?
We apologize for the misunderstanding here. As we stated in the text, sex-lethal is the insect sensor for chromosome dosage compensation and triggering sex-specifying genetic circuits, but it is unique to insects, and a homologue has not been found in other spider genomes, including U. diversus. We found several doublesex homologues in several chromosomes, but there is no reason to expect them to be on the X chromosomes. In flies, doublesex is not on the X chromosome (it's on chromosome 3). In fact, most sex-specifying genes are not on sex chromosomes. Also, most doublesex homologues (Dmrt genes) are not involved in sex-determination in flies and other animals, so determining which U. diversus homologue is involved in sex-determination would involve considerable work. Even then, there would be no expectation to find it on the X-chromosome.
However, for chromosome-based sex determination, a chromosome dosage sensor is necessary to determine the embryo's zygosity, and differentially trigger sexdetermining gene circuits (located on other chromosomes) based on chromosome dosage. Different mechanisms for sensing chromosome dosage and for dosage compensation have evolved independently in different animal phyla [7]. However, our knowledge of this is limited to nematodes, insects, and placental mammals, and in all three cases, these animals possess a single X chromosome. How dosage compensation occurs in multiple X chromosome animals (such as spiders, monotremes, etc.) is unknown. We hope our work can help contribute to this line of research.
7. Line 515-516: "The 534 shared sex-linked genes in these three species, 14 are predicted to be DNA/RNA-binding", if these sex-linked genes have difference on RNA level between male and female? Thank you, this is a great question. Unfortunately, we do not have sufficient sample numbers or read depths for the single samples used for the transcriptome assembly to make any statistically relevant statements regarding any differences which might exist in the expression of these 14 genes. This is something we would like to address in the future. For the purposes of this publication, we focused on obtaining a sufficient transcriptome assembly to assist in identifying gene models with BRAKER and provide some annotations. Another challenge is that RNA levels might be the same, but there could be differences in splicing. For example in insects, sex-lethal is alternatively spliced in males and females which reinforces the protein expression disparity between sexes. Alternatively, the chromosome dosage sensor may not be a protein-encoding gene. For example in mammals, the master regulator is Xist, which is a non-coding RNA.
Thank you for your attention to this detail. We have changed the formatting from italic to bold for this subsection heading on line 793. 9. Line 764: "We then used the Trinity assembler43 v.2.12.0", the number of 43 may be redundancy.
Thank you for catching this typo. We have removed the number 43 from line 873.
Thank you for noticing this. We have added RRIDs for every software with an available RRID at this time. There are two software packages for which RRIDs were not available: Mitos2 and Pseudohaploid. Otherwise, all RRIDs have been provided. 11. Lines 780 "using the BRAKER 2 pipeline" changes to "using the BRAKER2 pipeline.
Thank you for pointing this error out. We have removed the space indicated on line 891. 12. Lines 950: "Literature Cited" changes to "Reference".
We appreciate this suggestion. "Literature Cited" has been changed to "References" on line 1076.
13. Lines 952-953: wrong cite. The World Spider Catalog is a web online, the version and the data you accessed from should also added, and the author's name should change to World Spider Catalog.
Thank you for catching this error. The appropriate changes have been made to address this error in the reference formatting. The orb-web is a remarkable example of animal architecture that is observed in families of spiders 26 that diverged over 200 million years ago. While several genomes exist for araneid orb-weavers, 27 none exist for other orb-weaving families, hampering efforts to investigate the genetic basis of 28 this complex behavior. Here we present a chromosome-level genome assembly for the cribellate 29 orb-weaving spider Uloborus diversus. The assembly reinforces evidence of an ancient arachnid 30 genome duplication and identifies complete open reading frames for every class of spidroin gene, 31 which encode the proteins that are the key structural components of spider silks. We identified 32 the two X chromosomes for U. diversus and identify candidate sex-determining loci. This 33 chromosome-level assembly will be a valuable resource for evolutionary research into the origins 34 of orb-weaving, spidroin evolution, chromosomal rearrangement, and chromosomal sex- Spiders are among the most successful and diverse terrestrial predators on Earth. Almost 400 42 million years of evolution has produced more than 50,000 extant spider species representing 128 43 families that are distributed over every continent except Antarctica [1]. The success of these 44 animals is due in part to the diversity of behaviors that have evolved to capture prey in different 45 environments [2]. Many spiders attack their prey by physically grabbing and immobilizing them 46 with venom and use their silk exclusively for egg sacs. Others use silk to line their burrows, or 47 6 119 To limit genetic variation, we used sequencing data from only 5 unmated female spiders in our 120 assembly. We used Illumina to obtain high fidelity read data with from a single female spider, 121 generating 795M 150 bp read pairs totaling 119.3 Gb. Because Illumina short reads are not 122 sufficiently long to span long, highly repetitive regions encountered in spider genomes, we 123 sequenced 3 ONT libraries, each from a single female, generating 14.7M reads totaling 98.4 Gb, 124 with a read N50 of 6.7 kb. To obtain long sequencing reads with high sequence fidelity, we also 125 sequenced a single adult female using PacBio HiFi, generating 35M subreads totaling 412.8 Gb 126 with a read N50 of 12.8 kb, which yielded 2M consensus reads totaling 26 Gb with a read N50 of 127 13.0 kb. To investigate the sex determination system in U. diversus, we also generated an Illumina 128 library from a single adult male, producing 937M 50 bp read pairs totaling 46.8 Gb. Sequencing 129 library statistics are available in Table 1.  To further improve our assembly, we sequenced using PacBio HiFi and assembled the resulting 178 HiFi reads with PacBio's IPA (Improved Phased Assembly) pipeline, resulting in a substantial 179 decrease in the new assembly span, from 2.8 Gbp to only 2.1 Gbp. The number of scaffolds in 180 this assembly was only 9,734, a remarkable improvement. The scaffold N50 in the IPA assembly 181 was 328,082 bp and the scaffold L50 in the IPA assembly was 1,789 ( Table 2). The BUSCO 182 score for the IPA assembly indicated that this assembly contained 92.3% of BUSCOs complete 183 (83.7% in single copy, 9.3% duplicated) ( in the assembly; however, the total number of scaffolds was reduced by 78% to only 1,586 195 scaffolds, with a remarkable improvement in scaffold N50, which increased to 185,519,777 bp in 196 the Hi-C scaffolded assembly (Table 2, Figure 3C). Most importantly, 88% of the total assembly 197 was represented by 10 large scaffolds that comprise 1.9 Gbp (Figure 3B), matching the expected 198 number of chromosomes ( Figure 3A). The BUSCO score for the final assembly showed 94. 8% 199 of the BUSCOs were complete (with 85.2% in single copy, 9.6% duplicated) ( Table 3) Table S1). We will refer to these 10 scaffolds as pseudochromosomes. 203 204

Repeat Annotation 205
To characterize repetitive sequences, we constructed a species-specific repeat library using 206

RepeatModeler2[55] This library was used in conjunction with the RepBase RepeatMasker 207
Edition [56] database for masking the genome. RepeatMasker analysis of the combined U. 208 diversus and RepBase repeats masked 66.6% of the final U. diversus genome assembly. Many 209 (29.27%) of the repetitive regions were unclassified; however, DNA transposons accounted for a 210 similar proportion (22.73%). Retroelements accounted for a much smaller proportion (7.7%). Total 211 interspersed repeats account for 59.7% and simple repeats cover 2.61% of the genome (Table  212 4). The disparity between the GenomeScope estimation of 49.8% repetitive sequence and the 213 RepeatMasker estimation of 66.6% suggests that the repeat content may be underestimated by 214 GenomeScope. Therefore, the genome size may also be underestimated by GenomeScope. 215 Because the length of the U. diversus genome is slightly less than the prediction by 216 GenomeScope, this possibility suggests that the length of the assembly may more closely 217 represent the true length of the U. diversus genome than the GenomeScope prediction. The 218 repeat content is typical of spider genomes (Supplemental Table S2). 219 220

Transcriptome Sequencing and Assembly 221
To identify protein-coding genes, we assembled a transcriptome. To capture a wide range of 222 transcripts, we extracted RNA from spiders at multiple developmental stages and from male and 223 female adults. We produced an Illumina short-read sequencing library from: a whole adult female, 224 a whole adult male, the dissected prosoma (cephalothorax) from an adult female, the dissected 225 opisthosoma (abdomen) from the same adult female, the dissected prosoma from an adult male, 226 the dissected opisthosoma from the same adult male, the pooled legs from the dissected male 227

Non-Coding RNA Annotation 248
We used tRNAscan-SE[65,66] to annotate transfer RNAs. We found 3,084 tRNAs coding for the 249 standard 20 amino acids and 14 tRNAs coding for selenocysteine (TCA) tRNAs. We found 21 250 tRNAs with undetermined or unknown isotypes, 537 tRNAs with mismatched isotypes, and 57,824 251 putative tRNA pseudogenes. We identified no putative suppressor tRNAs. We used Barrnap[67] 252 to annotate ribosomal rRNAs. We found 114 rRNAs, of which 100 were located on the 10 pseudo- We found 12 full-length candidate sequences, including at least one candidate for each of the 293 seven types of spidroin used by cribellate orb-weavers, as well as candidate sequences for the 294 ampullate spidroin (AmSp) and two proposed paracribellar spidroins (Sp_vA and Sp_vB) recently 295 reported by Correa-Garhwal, et al. [96]. There were no gaps in the assembly interrupting our 296 candidate spidroin sequences with the exception of Sp_vA. We performed read mapping to 297 validate the continuity of each full-length sequence and ensure that the predicted sequences were 298 not chimeric. In 10 cases, including the 3 minor spidroin (MiSp) candidates, a major spidroin 299 (MaSp) MaSp-1 candidate, both MaSp-2 candidates, a tubuliform spidroin (TuSp) candidate, an 300 aciniform spidroin (AcSp) candidate, an ampullate spidroin (AmSp) candidate, and one 301 paracribellar spidroin candidate (Sp_vB), the full length of the predicted genomic region was 302 spanned entirely by at least one HiFi consensus read. In the remaining cases, consisting of a 303 pyriform spidroin (PySp) candidate, a cribellate spidroin CrSp candidate, a pseudoflagelliform 304 spidroin (Pflag) candidate, and a paracribellar spidroin candidate (Sp_vA), no more than 2 HiFi 305 reads were necessary to span the entirety of the predicted genomic region, and in each of these 306 cases there was sufficient depth and overlap in the reads to call the region with high confidence 307 (see Supplemental Figure S1). The length of the coding regions for the spidroins ranged from Aciniform silk is one of the toughest spider silks, and is typically used for wrapping prey [105]. A 319 single exon for AcSp was identified on Chromosome 7 ( Figure 4A and Table 7), consistent with 320 the sequences in the araneid orb-weaving spiders Araneus ventricosus [106] and Argiope 321 agentata [107], as well as the cobweb spider Latrodectus hesperus [26]. Our confidence in this 322 sequence is high since the complete sequence was spanned entirely by multiple PacBio HiFi 323 reads. In the repetitive region, we found 13 iterated repeats of a 357 amino acid motif, and a 14th 324 partial repeat (Supplemental Figure S1). This differs somewhat from the results reported by 325 Correa-Garhwal, et al., which identified 10 tandem repeats [96]. As with previous reports on the 326 structure of AcSp[26,106-108], we also found that the repeats are remarkably well-homogenized. 327 After removal of the signal peptide between Ser-23 and Arg-24, the remaining N-terminal domain 328 secondary structure includes 5 alpha-helices, and a C-Terminal domain consisting of 4 alpha 329 helices which is consistent with the structure found in other AcSps [106]. We found a single candidate for Pflag on Chromosome 7 ( Figure 4A and Table 7). Our results 341 with respect to the structure of Pflag differ slightly from those reported by Correa-Garhwal, et al. 342 While they reported a roughly 2600 aa protein sequence consisting of 48 repeats, we determined 343 that the internal structure consists of 91 repeats, each of which range between 39 and 70 aa in 344 length. We also found that these repeats are composed of two parts: a glycine-poor spacer region 345 followed by a glycine-rich repeat region with variations on the motif PSSGGXGG. We do not have 346 full-length transcripts available in our own data to offer support for one or the other structure, 347 although we do have multiple HiFi consensus reads that extend the majority of the sequence and 348 into the terminal regions suggesting that our assembled length is accurate. 349 350 10 is 20,195 nt ( Figure 4A and Table 7). The U. diversus CrSp gene was predicted to be a 2-361 exon gene, with a single 83 nt intron, which is consistent with what we found by manual inspection. 362 While the entire CrSp locus was not spanned by single HiFi reads, we are still confident in the 363 sequence produced, since no more than 2 reads were required to span the entire sequence. The 364 N-terminal region of the predicted protein product consists of 874 aa and the C-terminal region 365 consists of 296 aa ( Table 7). The long N-terminal domain is consistent with that found in Octonoba 366 spp, which were also found to have coding regions more than 2 kbp[97]. We found an internal  Draglines are produced by the major ampullate gland which produces two major ampullate 386 . This silk has extremely high tensile strength and elasticity and is 387 commonly used for the primary load-bearing parts of the web such as the frame and radii. It is 388 also the primary silk produced by spiders when they are navigating their environment [5]. We found 389 three candidates for major ampullate spidroin (MaSp). Based on previous work that identified 390 multiple distinct classes of MaSps, we were able to assign one of our candidate sequences to the The MaSp-1 candidate is the single spidroin sequence we were not able to call as a complete 396 sequence. In our annotation, the sequence appears as a two-exon gene, with the 5' sequence 397 and 3' sequence found in different reading frames; however, a close inspection of the data 398 suggests that this is not likely to be correct. We found instead that there is a large region to which 399 the PacBio HiFi reads mapped poorly. There is consensus between the reads that indicates 400 sequence found in the reference assembly that is not found in the reads. However, it is not clear 401 from inspection exactly where the boundaries should be called for this region. This is likely an 402 artifact of the assembly, since the reference was assembled from polymerase-based sequencing 403 which is susceptible to polymerase-slippage. Minor ampullate silk has lower strength, but greater extensibility, and is composed of spidroin 420 made by the minor ampullate gland. While it is commonly used for the construction of the auxiliary 421 spiral in orb-weavers, it is used for prey wrapping by cob-weavers[117]. We found three 422 candidates for minor ampullate spidroin (MiSp). All three MiSp loci were located near one another 423 on Chromosome 1 (Figure 4A). 424

425
The first candidate, MiSp-1 is a single exon sequence ( The second and third candidates, MiSp-2a and MiSp-2b, shared a nearly identical amino acid 434 composition, which was slightly different from that of MiSp-1. Both are single exon sequences 435 (Table 7). Their hydrophobicity profiles are also slightly different from MiSp-1. Pyriform silk serves as an adhesive compound used to adhere silk lines to one-another, or to 450 substrate that holds the web [5]. We found one single exon candidate pyriform spidroin (PySp) on 451 Chromosome 1 (Figure 4A), which is consistent in size with a prior PySp sequence reported from 452 Araneus ventricosus [103]. We found that the internal repetitive region was preceded by a Q-rich Tubuliform silk is used to encase the egg sac and is spun from tubuliform glands. We found a 462 single candidate for tubuliform spidroin (TuSp) on Chromosome 2 ( Figure 4A and Table 7) which 463 is a single exon. We found an internal region that was composed of 10 repeats, ranging from 262 464 aa -302 aa in length. This is consistent with other reported TuSp repeats, which have been 465 observed between 176 and 375 residues, although it seems that the typical TuSp module is 466 shares considerable synteny with A. bruennichi chromosomes 3-5, while U. diversus 2 is biased 520 for shared synteny with A. bruennichi chromosomes 1-2 and 11-12. Even though D. plantarius 521 diverged more recently from U. diversus (Figure 1), there appears to be a greater degree of 522 chromosomal rearrangement between these two species. However, two outliers are U. diversus 523 3 and 10 which share nearly exclusive synteny with D. plantarius 12 and 5, respectively. The large 524 degree of synteny between U. diversus 3 and 10 with M. bourneti X1 and X2 is a strong indication 525 that these two chromosomes are the sex chromosomes for U. diversus. 526 527

Sex Chromosomes 528
The most common and likely ancestral system of chromosomal sex determination in spiders is 529 ♂X1X2/♀X1X1X2X2[48]. However, spiders exhibit a diversity of sex determining systems, with some 530 including Y chromosomes, and others possessing up to 13 X chromosomes [132]. Usually, these 531 sex determining systems are determined by karyotyping[48,132] (Figure 3). The genetic basis of 532 sex determination and chromosomal dosage compensation are unknown for spiders. Determining 533 the genetic identity of X chromosomes has been challenging, in part due to significant levels of 534 shared synteny between sex chromosomes and autosomes[48] (Figure 5), as well as a paucity 535 of spider genomes with chromosome-level scaffolds. Sex-linked scaffolds from more fragmented 536 genomes have been identified by quantifying the relative difference in read depth from sperm with 537 or without the X chromosomes [133]. In this approach, the ratio of scaffold reads from sperm nuclei 538 without X chromosomes (based on flow cytometry) to the sum of reads from all nuclei produces 539 two peaks. The lower peak is from reads that were under-sampled due to mapping to X-540 chromosomes. Similarly, the X chromosomes for A. bruennichi (also ♂X1X2/♀X1X1X2X2) were 541 recently identified through disparities in read coverage of X chromosomes between males and 542 females[48]. In principle, because males have only one copy of each X chromosome, the average 543 read depth for scaffolds from these chromosomes should be half that of autosomes. To determine 544 the sex chromosome in U. diversus, we assembled an Illumina short-read library from a single 545 male spider and mapped the reads onto the 10 assembled pseudochromosomes (Figure 6). 546 547 While 8 of the 10 pseudochromsomes had a median read depth of 40 ± 2, pseudochromosomes 548 3 and 10 were outliers, with read depths of 36 and 33, respectively. In contrast, read depths from 549 females were comparable for all chromosomes ( Figure 6). If all regions of these 550 pseudochromosomes were exclusively unique to X chromosomes, the expected read depth in 551 males would have been ~20. However, our observations are consistent with those observed in A. 552 bruennichi, where some X-associated scaffolds were observed to produce lower read depth from 553 males as expected, several other X-associated scaffolds had read depths comparable to 554 autosomes[48] (Figure 5). Why we and others observe this phenomenon is not well understood. 555 One possibility is that reads mapping to homologous autosomal and pseudoautosomal regions 556 on the X chromosomes should decrease the expected depth disparity between autosomes and X 557 chromosomes, and lead to a disparity of ~75% rather than 50%, which is closer to what we 558 observe. The higher than expected read depth could also be due to mis-assembly of these 559 pseudochromosomes, however very little linkage was observed between pseudochromosomes 3 560 and 10 in the Hi-C data (Figure 3). A third possibility is that spiders may have somatic 561 chromosome copy number variation, as observed in birds and aging mammals [134,135]. Despite 562 these caveats, the lower median read depth in males for pseudochromosomes 3 and 10 is a 563 strong indicator these likely represent the two X chromosomes for U. diversus. 564 565 Prior work with Stegodyphus mimosarum (also ♂X1X2/♀X1X1X2X2) identified sex-linked scaffolds 566 based on lower read depth of sperm lacking X chromosomes [133]. When genes identified on 567 these X-linked S. mimosarum scaffolds were mapped on to the U. diversus pseudochromosomes 568 ( Table 5), 62% of these genes mapped onto pseudochromosomes 3 and 10 ( Figure 6B). This 569 large fraction of predicted X-linked genes between two distantly related species of spiders is a 570 strong indicator that not only are pseudochromosomes 3 and 10 likely to be the X chromosomes, 571 but that the genetic composition of these chromosomes has remained fairly stable amongst 572 spiders. Since two X chromosomes were recently identified in A. bruennichi, we compared the 573 genetic composition (Supplemental Table S6) and synteny between the X chromosomes 574 identified in both species (Figure 6). In addition to shared X-linked genes (Supplemental Table  575 S6, Figure 6B), A. bruennichi scaffolds 9 and 10 appear to share considerable synteny with U. 576 diversus pseudochromosomes 3 and 10 ( Figure 6C), while sharing little synteny with the 577 predicted autosomes (Figure 6C). This also appears to be true of the sex chromosomes X1 and 578 X2 from M. bourneti, however this genome is not currently annotated. Based on this conserved 579 X-linked synteny, we predict that chromosomes 5 and 12 from D. plantarius are X chromosomes, 580 as are chromosomes 1 & 9 from L. elegans (Figure 6C). When chromosomal rearrangements 581 have occurred, they appear to have been largely confined to rearrangements between sex-582 chromosomes. The sex chromosomes themselves share little synteny with each other (Figure 5), 583 which indicates they are not the result of an ancient duplication, but there appears to be selective 584 pressure to ensure that when chromosomal rearrangements do occur, that they occur between 585 sex chromosomes. When we compared genes in the syntenic regions of the sex chromosomes 586 from D. plantarius and L. elegans to the 534 conserved genes from U. diversus, A. bruennichi 587 and S. mimosarum, the number of X-linked genes shared between all five species only reduced 588 to 526 genes, indicating there is selective pressure to retain these genomic regions on sex 589 chromosomes. 590 591 However, some syntenic blocks are shared with autosomes. One of these syntenic blocks is the 592 Hox gene cluster located on chromosome 10 for both U. diversus and A. bruennichi. The presence 593 of a hox cluster on a sex chromosome was surprising since these genes play critical roles in 594 development. Therefore, either dosage compensation is needed in males, or dosage disparity 595 between males and females plays a role in developmental sexual dimorphism. 596

597
In insects, the primary sex chromosome dosage sensor is sex lethal (sxl), which then triggers a 598 cascade of sex-defining signaling events leading to sexually dimorphic expression of genes 599 and/or splice variants. While no sxl homologue has been found in spider genomes (including U. 600 diversus), other genes involved in sexual dimorphism, such as doublesex (dsx) are present. Thus, 601 while sex-determining systems may be conserved, the mechanism spiders use for sensing 602 X:autosome ratio differences to trigger these circuits remains unknown, but relevant genes are 603 likely in the regions containing the 526 genes conserved amongst the five species. Of the 526 604 shared X-linked genes in these five species, 14 are predicted to be DNA/RNA-binding, and may 28 play a role in sex-determination. However, non-protein-coding RNAs (such as Xist in mammals) 606 can also act as effectors for dosage compensation, therefore the sex-trigger mechanism in 607 spiders may not be the 526 conserved protein-coding genes, but unidentified small RNA genes 608 in these regions. Regardless, the X-linked genes shared between these five species 609 (Supplemental Table S6) will be a valuable resource for comparative analysis to identify 610 conserved genes that serve as sex-specifying triggers for spiders. Uncovering how spiders 611 invertebrates [140,141]. Spiders exhibit considerable morphological and behavioral sexual 656 dimorphism that is based on a multiple-sex chromosome system. Understanding the genetic 657 underpinnings of spider sexual development will contribute to a fuller understanding of how 658 chromosomal sex determination can evolve independently in different species. Here we provide 659 evidence for the identities of sex chromosomes in U. diversus and leverage this information to 660 identify 14 candidate DNA-binding genes that are shared between three divergent species of 661 spiders. 662 663 Our genome will facilitate comparative studies and meets a specific need in the field for a greater 664 representation of genomes from the UDOH+RTA clade that represent nearly half of all known 665 spider species (Figure 1)[43]. We expect that the highly contiguous draft genome and 666 transcriptome datasets we produced for U. diversus will serve as a valuable resource for 667 continuing research into the evolution, development, and physiology of spiders, as well as a vital We soaked embryos soaked in Grace's insect medium (Gibco) containing 0.1% colchicine for 2 689 hours. We then added an equal volume of hypotonic solution. After 15 min, we transferred the 690 embryos to a 3:1 ethanol:acetic acid solution for 1 hour. After fixing, we transferred embryos to 691 gelatin-coated microscope slides and dissociated them in a drop of 45% acetic acid. We used 692 siliconized coverslips to squash the dissociated tissue and briefly froze them in liquid nitrogen. 693 After removing the slides from LN2, we immediately removed the coverslips with a razor blade 694 and transferred the slides as quickly as possible to 95% ethanol. We then performed a step-down 695 series from 95% ethanol to 70%, 35%, and finally to Grace's insect medium to return the tissue 696 to an aqueous solution. We then transferred the slides to a 1ug/mL DAPI solution. After a 10-697 minute incubation, we transferred the slides to de-ionized water to rinse and mounted coverslips 698 of Buffer ATL. We then added 20 uL Proteinase K and briefly vortexed the sample. We next 725 incubated the sample overnight at 56C with 900 rpm shaking on a ThermoMixer C (Eppendorf, 726 5382000023). After the overnight incubation, we then briefly centrifuged the sample to spin down 727 condensate on the tube. We next transferred 200 uL of lysate to a fresh 2 mL sample tube and 728 followed the manufacturer's protocol for manual purification of HMW DNA from fresh or frozen 729 tissue. We estimated DNA quality using a NanoDrop One Microvolume UV-Vis 730 Spectrophotometer and quantified DNA using a Qubit 4 Fluorometer (ThermoFisher) with a 731 Quant-iT dsDNA HS Assay Kit. We also measured DNA quality, quantity, and fragment length 732 distributions using the Agilent TapeStation 4200 System with Genomic DNA ScreenTape and 733 reagents before proceeding to library preparation. A typical preparation from a 20 mg spider 734  The suspension was spun again at 16,000 x g at 4 C for 2 min and the supernatant discarded. 770 The spider tissue pellet was combined with 20 ul Proteinase K and 150 ul Buffer PL1 and 771 resuspended by pipetting with a P200 wide bore pipette tip. The tissue was incubated on a 772 ThermoMixer at 55 C with 900 rpm mixing for 1 hour. After lysis, 20 ul RNaseA was added, and 773 the lysate was mixed by pipetting with a P200 wide bore pipette tip. The lysate was incubated at 774 RT for 3 min. After RNaseA incubation, 25 ul Buffer SB was added, the lysate was vortexed 5 x 775 1 sec pulses, and then centrifuged at 16,000 x g at 4 C for 5 min. The supernatant (~200 ul) was 776 transferred to a 70uM filter (Fisher # NC1444112) set in a new 1.5 mL Protein LoBind tube 777 (Eppendorf # 022431081). The tube with the 70 uM filter was spun on a mini-centrifuge (Ohaus # 778 FC5306) for 1 sec and then the filter was dicarded. 50 ul Buffer BL3 was added to the cleared 779 lysate and the tube was inversion mixed 10X. The tube was then incubated on a ThermoMixer at 780 55 C with 900 rpm mixing for 5 min. After incubation, the tube was allowed to come to RT, which 781 took about 2 min. The tube was spun for 1 sec on a mini-centrifuge to spin down condensate from 782 the lid. One 5 mm Nanobind disk was added to the tube followed by 250 uL isopropanol and then 783 the tube was inversion mixed 5X. The tube was then rocked on a platform rocker 784 (ThermoScientific # M48725Q) at RT and max speed for 30 min. The DNA-bound Nanobind disk 785 was washed according to handbook directions with one 500 ul CW1 wash and one 500 ul CW2 786 wash. The tube with the disk was tap spun for 2 x 1 sec to dry the disk. The DNA was eluted with 787 50 ul Buffer EB and incubated at RT overnight. The next day, the eluate was pipette mixed with 788 a standard bore pipette tip 5x and then quantitated with Nanodrop and Qubit dsDNA BR assay 789 and then sized by pulsed-field gel electrophoresis. 790

791
We then submitted the DNA sample to the University of Maryland School of Medicine Genomics 792 Core Facility for PacBio HiFi sequencing. There, they size-selected the DNA using a Safe Science 793 BluePippin with a 9kb high-pass cutoff. They prepared the sequencing library using the Express 794 of several spider species were used to provide seed sequences (Supplemental Table S3). The 831 resulting mitogenome sequences assembled by Novoplasty were compared for consensus. The  We used the ExPASy web server tool ProtScale to find the amino acid composition of each 964 sequence, as well as to estimate the hydrophobicity using the the Kyte-Doolittle method [196,197].                To the editors of GigaScience: This letter accompanies our revised manuscript entitled "Chromosome-level genome and the identification of sex chromosomes in Uloborus diversus." We appreciate the insightful comments and criticisms provided by the reviewers, and believe we have addressed all of their concerns in the following ways: 1) We modified the text to address errors, ambiguities, and clarify points raised by the reviewers 2) Figure 1: Corrected to include L. erectus.
3) Figure 4: Spidroin section was modified to include absent spidroins. 4) Figure 5: Synteny map for U. diversus and L. erectus were added. 5) Figure 6: Read depth distribution for female chromosomes was added, along with synteny maps for the sex chromosomes. 6) Supplemental Figure 1: This appeared to be absent in the original submission, and was included in the revised version. 7) Spidroin_sequences.txt: Text file was added of U. diversus spidroin protein sequences.
We thank the reviewers for their suggestions, and we believe the advised changes we made have improved the manuscript. We thank you for considering our manuscript once again for publication, and look forward to hearing from you soon. Sincerely, Andrew Gordus and Jeremiah Miller, on behalf of the authors

RESPONSE TO REVIEWERS
We appreciate the helpful comments from the Reviewers, which have improved the accuracy and clarity of the manuscript. We also thank the Reviewers for their positive comments on our study.
We address the specific points raised by the Reviewers below:

Reviewer #1
This paper presents the first uloborid spider genome--and it is a chromosome level assembly. Genomes of this family are important because the orb web is supposedly independently and convergently evolved in this group. Although my expertise is not in the technology and informatics of genome sequencing, it appears to be well done.
We thank the reviewer for their enthusiasm and insightful suggestions. We have made suggested changes, as outlined below.  Table S1 Number of Componenet Sequences--typo Thank you for catching these errors. They have been corrected in Figure 1 and Table  S1.

Text single exon We found a --typo
Thank you for drawing our attention to this typo. The correct punctuation has been added on line 460.

can be ascribed by --can be inferred by?
Thank you for this suggestion. This introductory sentence has been rewritten to provide clearer language on line 478. In doing so, this choice of words has been removed.
an Araneid orb-weaver--araneid usually not capitalized We appreciate this correction. The change has been made to the text on line 508. In addition, we identified three additional occurrences of the same mistake, on lines 24 (found in the Abstract), 317 (found in the AcSp section), and 666 (found at the end of the Whole Genome Duplication section in Discussion) which has also been corrected.

♂X1X2/♀X1X1X2X2.[48] should be ♂X1X2/♀X1X1X2X2 [48].
Thank you for catching this. We have made the correction to the punctuation on line 526, addressing the placement of the reference. Thank you for bringing our attention to this omission. A statement regarding the average integrated percent of mass accounted for by the major peak in the TapeStation results and the fragment length around which the peak is centered has been added on lines 730 and 731.
Any homologs of the Pyriform spidroin (PySp) in Acanthoscurria? Piriform silk attachment points are a synapomorphy of araneomorph or "true" spiders. Liphistiomorph and mygalomorph spiders do not (cannot?) make point attachments, and the inability to make point attachments either to substrate or silk-silk point attachments probably constrains/ed the evolution of web architectures in non-araneomorph spiders. Therefore finding homologs to PySp spidroins in non-araneomorph spiders is of great interest to explain araneomorph web architecture diversity.
Thank you, yes this is an excellent point. We initially held off on too much spidroin analysis since we were aware of another manuscript on U. diversus spidroins [1], and had agreed with the first author that we would avoid too much analysis in this area to avoid scooping them. However, since the paper was recently published, we have included some commentary on this on lines 450 to 455, and have cited their paper.
Likewise, tubuliform spidroin (TuSp) is probably a synapomorphy of entelegyne spiders, with derived female genitalia--a "flow-though" sperm management system. Eggsacs occur widely in non-entelegyne spiders, so it is a mystery why entelegynes have specialized spigots, glands, and spidroins for the same purpose. Indeed, the particular function of tubuliform silk is not clear. Any thoughts on this?
Thank you, this is an interesting point about entelegyne spiders. We can only speculate, but compared to mygalamorphs which primarily lay their egg sacs in their burrows that are well-protected from the elements, aranaemorphs produce egg sacs that are more exposed and either hung on webs, on terrestrial substrates, or carried by the spiders. Tubuliform silk may have evolved to provide enhanced protection for the eggs relative to mygalamorph egg sacs which rely on the burrow for added protection. This is highly speculative though, and we are unsure whether it is appropriate to include in the main text.
It is good to see attention paid to the mitochondrial genome, as many whole genome studies ignore it. In spiders, early work claimed that tRNA's appeared to be peculiar. Masta  Thank you for the suggestion. In the original text we did state the tRNAs we identified also lacked the 3′ aminoacyl acceptor stems (T-arms), but we have added a few sentences to clarify this odd observation on lines 262 to 268. Shrunken tRNAs occur in other animals as well, such as nematodes. However why this occurs is currently unknown.
Finally, any comments on evidence for or against the convergent evolution of the orb web? Homology between the pseudoflagelliform and flagelliform spidroins would be pertinent. The intro does raise expectations that some of the macro / larger evolutionary questions will be addressed in the paper, but many, see above, are only cursory or not too much. Perhaps include a sentence in intro acknowledging this, but saying that this paper intends to present the genome and address sex chromosomes, but other topics? For example the sections on some of the spidroins do not extensively discuss comparisons with other spider genomes.
Thank you, yes we were vague on this point, and added text to address this in the spidroin and discussion sections (Lines 332 to 335, and 634-649). As mentioned earlier, we were a little too circumvent on this point due to a desire not to overlap too much with Correa-Garhwal's work, but now that it's published, we feel free to discuss this at greater length.

Reviewer #2
In this study, the authors generated huge genome sequencing data and RNA-seq data and provided a genome assembly with rather complicated merging approach, of a spider with novel phylogenetic position. The genome undoubtedly added novel and important resources for deep understanding of spider evolution. However, there are still severe issues that need to be addressed. 1. There are huge sequencing data from different samples. However, I don't think that marge of different assemblies is good for a final qualified genome. Given high heterozygosity, that illumina data and ONT data from different individuals is quite difficult to use for assembling a clean genome. As shown in Table 2, assembly by Hify approach is not obviously inferior compared with the merged one, but obviously much better in avoiding redundancy. I strongly suggest that the author adopt the genome assembly of Hify data from one individual, instead of merging two sets of assemblies. Illumina and Nanopore assembly may be helpful in fully deciphering silk proteins.
Thank you, we apologize for the confusion on this point. The final assembly is ultimately based on the single-individual PacBio HIFI data. We used the assembly of ONT and Illumina data to patch gaps in the scaffolds produced from PacBio HIFI data. Overall, the gap closing process patched 2617 gaps, inserting 31.4Mbp of sequence, which amounts to only about 1.5% of sequence added to the HIFI assembly. The small amount of sequence added from the Illumina+ONT assembly improved the contiguity and likely resolved repetitive regions in places where the HIFI reads failed to span them. We expect that the additional sequence added to the assembly had no impact on our conclusions.
We have added language to clarify the process by which the SAMBA tool improves the PacBio IPA assembly, removing the misleading term "merge" and replacing it with the clarifying phrase "patch gaps", as well as the quantification of gaps patched, sequence length added, and percent of the genome for which this added sequence accounts on lines 182 to 186.

2.
Proportion of repeats are somewhat affected by the quality of assembly. The high heterozygous genome assembly is complicated merged by diverse batch of data, so the real quality might be not as good as the author described. The quality of repeat is especially hard to evaluate. Hence the statements on genome size  are not convictive.
Thank you, and again we apologize for the confusion. On lines 190 to 198 we described the statistics of the assembly. We agree, typically draft genome assemblies do not capture all sequence of the genome, for example highly repetitive subtelomeric and centromeric regions are typically missing. Although draft genomes have limitations, they have proven incredibly valuable as the basis for an understanding of the genetics and evolution of hundreds of plants, animals, and other species. We used the sequencing data that we had generated within our budget to the best extent possible to produce a contiguous de novo assembly whose quality exceeds most other published arachnid genomes. In addition, the assembled genome is consistent with that predicted by GenomeScope

3.
About the assembly of RNA-seq data. The authors get huge amounts of data. However, it is not so helpful to obtain novel transcripts if the data is saturated. More importantly, assembly of short reads is even not so useful to obtain long transcripts.
We agree that de novo assembly of short-read Illumina RNA-seq data is an error-prone process, and the data were generated before isoSeq was affordable. However, when a high quality genome is available, as is the case for the present study, transcript assembly becomes a much easier problem. We used the Trinity software package [2] to assemble transcripts from RNA-seq data guided by the genome assembly. Trinity is quite capable of assembling full length transcripts with the help of a well-assembled genome.

4.
As to whole genome duplication. The authors did not provided solid evidence supporting that WGD occurred in U. diversus genome. They only demonstrated two hox clusters therein. The synteny analysis was quite confusing which is not helpful in confirmation of WGD. They need to provide more solid genome-wide evidence, or otherwise totally downplay the statements.